Which bridge estimator is optimal for variable selection?
نویسندگان
چکیده
We study the problem of variable selection for linear models under the high-dimensional asymptotic setting, where the number of observations n grows at the same rate as the number of predictors p. We consider two-stage variable selection techniques (TVS) in which the first stage uses bridge estimators to obtain an estimate of the regression coefficients, and the second stage simply thresholds the regression coefficients estimate to select the “important” predictors. The asymptotic false discovery proportion (AFDP) and true positive proportion (ATPP) of these TVS are evaluated. We prove that for a fixed ATTP, in order to obtain the smallest AFDP one should pick an estimator that minimizes the asymptotic mean square error in the first stage of TVS. This simple observation enables us to evaluate and compare the performances of different TVS with each other and with some standard variable selection techniques, such as LASSO and Sure Independence Screening. For instance, we prove that a TVS with LASSO in its first stage can outperform LASSO (only one stage) in a large range of ATTP. Furthermore, we will show that for large values of noise, a TVS with ridge in its first stage outperforms TVS with other bridge estimators including the one that has LASSO in its first stage.
منابع مشابه
Feature Selection in High-Dimensional Classification
High-dimensional discriminant analysis is of fundamental importance in multivariate statistics. Existing theoretical results sharply characterize different procedures, providing sharp convergence results for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical results for the problem of variable selection have not been established...
متن کاملVariable selection in the accelerated failure time model via the bridge method.
In high throughput genomic studies, an important goal is to identify a small number of genomic markers that are associated with development and progression of diseases. A representative example is microarray prognostic studies, where the goal is to identify genes whose expressions are associated with disease free or overall survival. Because of the high dimensionality of gene expression data, s...
متن کاملOPTIMAL SELECTION OF NUMBER OF RAINFALL GAUGING STATIONS BY KRIGING AND GENETIC ALGORITHM METHODS
In this study, optimum combinations of available rainfall gauging stations are selected by a model which is consist of geo statistics model as an estimator and an optimized model. At the first, watershed is approximated to several regular geometric shapes. Then kriging calculates the variance &nbs...
متن کاملVariable selection for optimal treatment decision.
In decision-making on optimal treatment strategies, it is of great importance to identify variables that are involved in the decision rule, i.e. those interacting with the treatment. Effective variable selection helps to improve the prediction accuracy and enhance the interpretability of the decision rule. We propose a new penalized regression framework which can simultaneously estimate the opt...
متن کاملDiscrete-time repetitive optimal control: Robotic manipulators
This paper proposes a discrete-time repetitive optimal control of electrically driven robotic manipulators using an uncertainty estimator. The proposed control method can be used for performing repetitive motion, which covers many industrial applications of robotic manipulators. This kind of control law is in the class of torque-based control in which the joint torques are generated by permanen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1705.08617 شماره
صفحات -
تاریخ انتشار 2017